hazard rate
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)
Preference Models assume Proportional Hazards of Utilities
Modelling of human preferences is an important step in modern post-training pipelines for AI alignment. One popular approach of building such models of human preference is assuming that human preference rankings assume a Plackett-Luce (Plackett, 1975; Luce et al., 1959) distribution. In this monograph, I draw a somewhat remarkable connection of the popular statistical model for estimating lifetimes, the Cox Proportional Hazard model (Cox, 1972) to the Plackett-Luce model and then consequently to algorithms such as Direct Preference Optimization, a popular algorithm for aligning modern Artifical Intelligence (Ouyang et al., 2022). To the best of my knowledge, at the time of writing the connection between the Proportional Hazards model and the Plackett-Luce is relatively little known, and the subsequent connections to the AI alignment algorithms such as'Direct Preference Optimization ' (Rafailov et al., 2023) are not well appreciated. I believe that explcitly stating this connection will help the AI research community build on existing research in semi-parametric statistics to build better models of human preference.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- North America > United States > New York (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)
cc3f5463bc4d26bc38eadc8bcffbc654-AuthorFeedback.pdf
We thank all reviewers for their helpful comments. Our responses to each reviewer are below. The reviewer has three major critiques of the paper, which we address in order. NBA experiment, as there is no ground truth hazard rate. The reviewer's concerns center on the We acknowledge that our statement on "avoiding" negative
Spectral Survival Analysis
Shi, Chengzhi, Ioannidis, Stratis
Survival analysis is widely deployed in a diverse set of fields, including healthcare, business, ecology, etc. The Cox Proportional Hazard (CoxPH) model is a semi-parametric model often encountered in the literature. Despite its popularity, wide deployment, and numerous variants, scaling CoxPH to large datasets and deep architectures poses a challenge, especially in the high-dimensional regime. We identify a fundamental connection between rank regression and the CoxPH model: this allows us to adapt and extend the so-called spectral method for rank regression to survival analysis. Our approach is versatile, naturally generalizing to several CoxPH variants, including deep models. We empirically verify our method's scalability on multiple real-world high-dimensional datasets; our method outperforms legacy methods w.r.t. predictive performance and efficiency.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
- Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Is there a half-life for the success rates of AI agents?
Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model -- a constant rate of failing during each minute a human would take to do the task. This implies an exponentially declining success rate with the length of the task and that each agent could be characterised by its own half-life. This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks -- that they involve increasingly large sets of subtasks where failing any one fails the task. Whether this model applies more generally on other suites of tasks is unknown and an important subject for further work.
Deep State-Space Generative Model For Correlated Time-to-Event Predictions
Xue, Yuan, Zhou, Denny, Du, Nan, Dai, Andrew M., Xu, Zhen, Zhang, Kun, Cui, Claire
Capturing the inter-dependencies among multiple types of clinicallycritical Time-to-event prediction (also known as survival analysis) investigates events is critical not only to accurate future event prediction, the distribution of time duration until the event of interest but also to better treatment planning. In this work, we propose a happens in the presence of event censorship. In the healthcare domain, deep latent state-space generative model to capture the interactions it is an essential tool for modeling the risks of critical medical among different types of correlated clinical events (e.g., kidney events and capturing of the relationship between the co-variants failure, mortality) by explicitly modeling the temporal dynamics and the risks [9]. of patients' latent states. Based on these learned patient states, we Recently, machine learning methods have been applied to timeto-event further develop a new general discrete-time formulation of the hazard predictions to provide flexible modeling of the time distribution rate function to estimate the survival distribution of patients [6, 19, 22], and capture the nonlinear relationship between with significantly improved accuracy.
- Health & Medicine > Therapeutic Area > Nephrology (0.68)
- Health & Medicine > Health Care Technology > Medical Record (0.46)
Tabular and Deep Reinforcement Learning for Gittins Index
Dhankar, Harshit, Mishra, Kshitij, Bodas, Tejas
In the realm of multi-arm bandit problems, the Gittins index policy is known to be optimal in maximizing the expected total discounted reward obtained from pulling the Markovian arms. In most realistic scenarios however, the Markovian state transition probabilities are unknown and therefore the Gittins indices cannot be computed. One can then resort to reinforcement learning (RL) algorithms that explore the state space to learn these indices while exploiting to maximize the reward collected. In this work, we propose tabular (QGI) and Deep RL (DGN) algorithms for learning the Gittins index that are based on the retirement formulation for the multi-arm bandit problem. When compared with existing RL algorithms that learn the Gittins index, our algorithms have a lower run time, require less storage space (small Q-table size in QGI and smaller replay buffer in DGN), and illustrate better empirical convergence to the Gittins index. This makes our algorithm well suited for problems with large state spaces and is a viable alternative to existing methods. As a key application, we demonstrate the use of our algorithms in minimizing the mean flowtime in a job scheduling problem when jobs are available in batches and have an unknown service time distribution. Markov decision processes (MDPs) are controlled stochastic processes where a decision maker is required to control the evolution of a Markov chain over its states space by suitably choosing actions that maximize the long-term payoffs. An interesting class of MDPs are the multi-armed bandits (MAB) where given K Markov chains (each Markov chain corresponds to a bandit arm), the decision maker is confronted with a K-tuple (state of each arm) and must choose to pull or activate exactly one arm and collect a corresponding reward.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)